Machine Learning Chunks

LedaK

29 May, 2021

P1. Posterior Probability Distribution

Task: Consider now that x is distributed as x~Ν(μ,16); we believe that the prior for the mean is μ~Ν(0,4). Use the distribution Ν(7,16) to generate observations for x.

  1. Develop an algorithm that estimates the posterior distribution’s mean and variance, assuming we have available N= 1, 5, 10, 20, 50, 100 and 1000 observations, respectively.
  2. For every N, provide a diagram that shows the prior distribution, the distribution generating the data, and the estimated posterior distribution.

Implementation:

The algorithm implemented is fairly straightforward, and utilizes the formulas for the mean and variance that were proven above for the posterior distribution.

## quartz_off_screen 
##                 2

Running the algorithm provides the following insightful graphs:

N=5 N=10 N=20 N=50 N=100 N=1000

Posterior Distribution ~ N

P2. Polynomial Curve Fitting

Task: Draw a period of the sinusoidal function y(x)=sin(2πx) and select N samples for x uniformly distributed in the interval [0,1]. To every y(x) value add Gaussian noise distributed as Ν(0,1) to generate a set of observations.

  • Fit to the noisy observations a polynomial (in the data) model of degree M=2,3,4,5 or 9 and provide a table with the coefficients of the best least-squares fit model and the achieved RMSE.
  • Provide a plot showing the function y(x), the observations drawn, and the best fit model for every different value of M.
  • Repeat the above procedure for N=10 and N=100.

Implementation:

The algorithm was implemented for 3 different values of N: 25, 10, and 100, and for models with degrees M=2,3,4,5 and 9. The coefficients of the polynomial model were estimated via the Least Squares method.

N = 25

Weights for N=25
Weights M2 M3 M4 M5 M9
w0 1.551613 -1.13713 -0.2993927 -1.780131 -1.499296
w1 -5.243373 25.37546 12.1534910 42.439316 34.864005
w2 3.201364 -66.77343 -12.9503574 -199.220992 -132.150306
w3 NA 43.32511 -35.4599000 429.877872 174.062662
w4 NA NA 37.7025812 -466.311479 -14.101311
w5 NA NA NA 196.851378 -113.825112
w6 NA NA NA NA -40.200131
w7 NA NA NA NA 70.458280
w8 NA NA NA NA 86.706776
w9 NA NA NA NA -62.496632

N=25

N = 10

Weights for N=10
Weights M2 M3 M4 M5 M9
w0 0.342139 -1.656795 -2.436627 -2.009874 -3.134021
w1 2.468431 22.022072 34.079037 26.614343 43.727136
w2 -4.080392 -48.872120 -97.669662 -59.999460 -118.178395
w3 NA 28.343264 99.483133 20.552445 55.051207
w4 NA NA -34.032065 39.376230 82.728812
w5 NA NA NA -25.132615 11.075251
w6 NA NA NA NA -56.311970
w7 NA NA NA NA -69.493005
w8 NA NA NA NA -21.053940
w9 NA NA NA NA 76.968799

N=10

N = 100

Weights for N=100
Weights M2 M3 M4 M5 M9
w0 1.4275230 0.098293 -0.0150933 -0.340045 -0.249578
w1 -3.3739325 10.675388 12.4554136 20.179368 16.389805
w2 0.9378136 -31.876898 -39.1585183 -90.886316 -53.898997
w3 NA 20.985681 31.7178893 167.869434 38.019581
w4 NA NA -5.1755582 -157.587446 22.747595
w5 NA NA NA 60.809248 -8.326979
w6 NA NA NA NA -20.004700
w7 NA NA NA NA -13.345086
w8 NA NA NA NA 1.858392
w9 NA NA NA NA 17.120006

N=100

P3. Predictive Bayesian

Task: For the same setup as in Problem 4 above, let’s assume that the observations are generated as t = y(x) + η, where y(x)=sin(2πx) and the Gaussian noise η is distributed by Ν(0, β-1) with β=11.1. You are given a dataset generated in this way with Ν=10 samples (x,t) where 0<x<1. Assume that you want to fit to the data a regression model of the form t = g(x,w) + η, where g(x,w) is an M=9 degree polynomial with coefficients vector w following a Normal prior distribution with precision α=0.005 (Bayes approach).

Construct the predictive model which allows for every unseen x (not in the training set) to produce a prediction t. Plot the mean m(x) and variance s2(x) of the predictive Gaussian model for many different values of x in the interval 0<x<1. What do you observe? Discuss your findings.

Implementation:

In order to fully incorporate the Bayesian approach for this task, the following formulas were taken into account:

  • For the variance matrix S of the posterior distribution over the coefficients: \({{\bf\mathcal{S}}_N^{-1} = a{\bf I} + \beta{\bf\Phi}^T{\bf\Phi}}\)
  • For the mean of the posterior distribution over the coefficients: \({{\bf{m_N} = \beta{\bf\mathcal{S}}_N}{\bf\Phi}^T{\bf t}}\)
  • For the variance of the predictive distribution, for a point x from the test set: \({\sigma_N^2(x) = \frac{1}{\beta} + \phi(x)^T{\bf\mathcal{S}}_N \phi(x)}\)
  • for the mean of the predictive distribution, for a point x from the test set: \({\mu = {\bf m}_N^T \phi(x)}\)

The four equations were incorporated in a single function, bayes.predict that computes the predictive distribution. The approach was tested for various numbers of observations, N=10, 50, 100, 200, 1000.

N = 50 N = 100 N = 200 N = 1000